Submitted to the Annals of Statistic. 



ESTIMATION OF A K— MONOTONE DENSITY, PART 1: 
CHARACTERIZATIONS, CONSISTENCY, AND MINIMAX 

LOWER BOUNDS 

By Fadoua Balabdaoui*'* and Jon A WellnerJ 

University of Gottingen and University of Washington 

Shape constrained densities are encountered in many nonpara- 
metric estimation problems. The classes of monotone or convex (and 
monotone) densities can be viewed as special cases of the classes 
of k— monotone densities. A density g is said to be k— monotone if 
( — l) l g is nonnegative, nonincreasing and convex for I = 0, . . . , k — 2 
if k > 2, and g is simply nonincreasing if k = 1. These classes of 
shaped constrained densities bridge the gap between the classes of 
monotone (1-monotone) and convex decreasing (2-monotone) densi- 
ties for which asymptotic results are known, and the class of com- 
pletely monotone (oo— monotone) densities. It is well-known that a 
density is completely monotone if and only if it is a scale mixture 
of exponential densities (Bernstein's theorem). Thus one motivation 
for studying the problem of estimation of a k— monotone density is 
to try to gain insight into the problem of estimating a completely 
monotone density. 

In this series of four papers we consider both (nonparametric) 
Maximum Likelihood estimators and Least Squares estimators of a 
k— monotone estimator. In this first part (part 1), we prove existence 
of the estimators and give characterizations. We also establish consis- 
tency properties, and show that the estimators are splines of order k 
(degree k — 1) with simple knots. We further provide asymptotic min- 
imax risk lower bounds for estimating a k— monotone density go(xo) 
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and its derivatives g$ (xo), j = 1, . . . , k— 1, at a fixed point xo under 
the assumption that (—l) k g^ k \xo) > 0. 

Part 2 of the series gives algorithms for computation of the esti- 
mators and an application of the methods to earthquake aftershock 
data. In part 3 we describe and establish existence of the limiting 
process Hk which governs the asymptotic distribution theory mod- 
ulo a certain conjecture involving a Hermite interpolation problem. 
In part 4 we give the limiting distribution theory in terms of Hk, 
again modulo the same Hermite interpolation problem. 

1. Introduction. Shape constrained densities are encountered in many 
nonparametric estimation problems. Monotone densities arise naturally via 
connections with renewal theory and uniform mixing (see Vardi (1989) and 
Woodroofe and Sun (1993) for examples of the former, and Woodroofe 
and Sun (1993) for the latter in an astronomical context). Convex densities 
arise in connection with Poisson process models for bird migration and scale 
mixtures of triangular densities; see e.g. Hampel (1987), Anevski (2003), 
and Lavee, Safrie, and Meilijson (1991). 

Estimation of monotone densities on the positive half-line IR + = [0, oo) 
was initiated by Grenander (1956) (with related work by Ayer, Brunk, Ew- 
ing, Reid, and Silverman (1955), Brunk (1958), and Van Eeden (1956), 
Van Eeden (1957)). Asymptotic theory of the maximum likelihood esti- 
mators was developed by Prakasa Rao (1969) with later contributions by 
Groeneboom (1985), Groeneboom (1989), Birge (1987), Birge (1989), and 
Kim and Pollard (1990). 

Estimation of convex densities on IR + was apparently initiated by Anevski 
(1994) (see also Anevski (2003)), and was pursued by Wang (1994) and 
Jongbloed (1995). The limit distribution theory for the (nonparametric) 
maximum likelihood estimator and its first derivative at a fixed point was 
obtained by Groeneboom, Jongbloed, and Wellner (2001b). 

Our goal here (and in the accompanying papers Balabdaoui and Well- 
ner (2004a), Balabdaoui and Wellner (2004b), and Balabdaoui and 
Wellner (2004c)) is to develop nonparametric estimators and asymptotic 
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theory for the classes of k-monotone densities on [0, oo) defined as follows: 
g is a k— monotone density on (0, oo) if g is nonnegative and (— \) l g^> is 
nonincr easing and convex for I G {0, . . . , k — 2} for k > 2, and simply non- 
negative and nonincreasing when k = 1. As will be shown in section 2, it 
follows from the results of Williamson (1956), Levy (1962), and Gneiting 
(1999) that g is a k— monotone density if and only if it can be represented 
as a scale mixture of Beta(l, k) densities; i.e. with x + = xl{x > 0}, 

g(x)= 4( y - x )*-idF(y) 

Jo y 

for some distribution function F on (0, oo). Note that for k = 1 this recovers 
the well known fact that monotone densities are in a one-to-one correspon- 
dence with scale mixtures of uniform densities, and, for k = 2, the corre- 
sponding fact frequently used by Groeneboom, Jongbloed, and Wellner 
(2001b) that convex decreasing densities are in a one-to-one correspondence 
with scale mixtures of the triangular, or Beta(l,2), densities. 

Our motivation for studying nonparametric estimation in the classes 
has several components: besides the obvious goal of generalizing the existing 
theory for the 1— monotone (i.e. monotone) and 2— monotone (i.e. convex 
and decreasing) classes V\ and 2?2, these classes play an important role in 
several extensions of Hampel's bird migration problem which are discussed 
further in Balabdaoui and Wellner (2005a). They also provide a potential 
link to the important limiting case of the A;— monotone classes, namely the 
class Pqo of completely monotone densities. Densities g in Dqo have the 
property that (-l) l g( l \x) > for all x G (0, oo) and / G {0, 1, . . .}. It follows 
from Bernstein's theorem (see e.g. Feller (1971), page 439, or Gneiting 
(1998)) that g G 2?oo if and only if it can be represented as a scale mixture 
of exponential densities; i.e. 

/•oo 

g(x) = jT 1 exp(-x/y)dF(y) 
Jo 

for some distribution function F on (0,oo). Completely monotone densities 
arise naturally in connection with mixtures of Poisson processes and have 
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been used in reliability theory (see e.g. Harris and Singpurwalla (1968), 
Doyle, Hansen, and McNolty (1980), Hill, Saunders, and Laud (1980)), 
and empirical Bayes estimation (see Robbins (1964) and Robbins (1980)). 
Jewell (1982) initiated the study of maximum likelihood estimation in the 
family and succeeded in showing that the MLE F n of the mixing distri- 
bution function F is unique and almost surely weakly consistent. Although 
consistency of the MLE follows now rather easily from the results of Pfan- 
zagl (1988) and van de Geer (1993), little is known about rates of conver- 
gence or asymptotic distribution theory for either the estimator g n of the 
mixed density g in (the "forward" or "direct" problem) or the estimator 
F n of the mixing distribution function F (the "inverse" problem). Although 
our present methods do not yield solutions of these difficult questions, the 
development of methods and theory for general k— monotone densites may 
throw some light on the issues and problems. 

Now we briefly describe the contents of the four related papers of which 
the present manuscript is part 1. 

In this paper (part 1), we consider the Maximum Likelihood g n and Least 
Squares g n estimators of a density g$ G T>k for a fixed integer k > 2 based 
on a sample X±, . . . , X n i.i.d. with density go- We show that the estimators 
exist, provide characterizations, and establish consistency of the estimators 
and their derivatives and gffl for j G {1, . . . , k — 1} (uniformly on closed 
sets bounded away from 0). In section 4 we establish asymptotic minimax 
lower bounds for estimation of g^ (xo), j = 0, . . . , k—1 under the assumption 
that g k \xo) exists and is non-zero. In part 1 we also include statements of 
known results for estimation of a completely monotone density go G T>oo 
whenever possible. One of the remaining open questions concerns existence 
of the least squares estimator; see Section 2. In section 5 we illustrate both 
the maximum likelihood and least squaqres estimators for k = 3 and k = 6 
in both the direct and inverse problems via artifical data generated from a 
standard exponential distribution. 
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In part 2 (Balabdaoui and Wellner (2004a)) we provide algorithms 
for computation of the estimators and for computation of (approximations 
to) the limit process H c ^ defined in part 3 (Balabdaoui and Wellner 
(2004b)). We call the basic algorithm developed and used in part 2 an itera- 
tive (2k — I)— spline algorithm since it extends the "cubic spline algorithm" 
developed in Groeneboom, Jongbloed, and Wellner (2001a) and Groene- 
boom, Jongbloed, and Wellner (2003). Part 3 is devoted to a study of the 
corresponding canonical Gaussian problem and the "invelope" (k even) or 
envelope (k odd) processes = lim^oo which arise in the solution of 
the Gaussian version of the problem. Thus part 3 extends and is analogous 
to the treatment for the case k = 2 given by Groeneboom, Jongbloed, 
and Wellner (2001a). Finally, part 4 (Balabdaoui and Wellner (2004c)) 
gives joint asymptotic distribution theory at a fixed point xq € (0, oo) of the 
vector of centered and scaled derivative estimators 



n 



(fc-j)/(2fc+l)/ 7 j(j) 



(g^(x )- 9 ^(x )), j = 0,...,fc-l) 



where g n is either the MLE g n or the LSE g n , under the assumption that 
gQ k \xo) exists and is non-zero. This yields behavior of the corresponding 
estimators of the mixing distribution F at fixed points (the inverse problem) 
as a corollary. 



Thus the main outcome of parts 3 and 4 generalizes the asymptotic distri- 
bution theory for estimating a nondecreasing density, and a nondecreasing 
and convex density at a fixed point: If xq > and go is a k- monotone 
density defined on (0, oo) such that go is fc-times differentiable at xo with 
(— l) k g k \xo) > 0, and g^ is assumed to be continuous in a neighborhood 
of xq, then our goal in parts 3 and 4 is to show that 



/ n^+i (g n (x ) - g (x )) 
n^{g { n\xQ)-g [ Q ) {x Q )) 
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and 

where g n is either the MLE of LSE, F n is the corresponding estimator of the 
mixing distribution function Fq, 

cj(go) = | (sbteo)) I ^ I | 

for j = 0, ■ ■ • , k — 1, and is an almost surely uniquely defined stochas- 
tic process that is (2/c)-convex (i.e., H^ k ~ 2 ^ exists and convex), and stays 
above (below) the (k — l)-fold integral of two-sided Brownian motion plus 
a polynomial drift of the form t 2k /(2k)\ if k is even (odd). Only a change 
of scale is necessary to realize that H\ and H2 are very closely related to 
the greatest convex minorant of W(t) + t 2 ,t G E, where W is two-sided 
Brownian motion, and the "invelope", H, of 

/o W(s)ds + t 4 , if t > 
J t ° VF(s)ds + t 4 , if t < 0. 

Deriving the rate of convergence of both the estimators g n and g n and their 

(i) (i) 

derivatives g~n , gn , j = 1, - • • ,k — 1, and proving the existence of the 
stochastic processes for k > 2 involved in the joint asymptotic distri- 
bution still depends on a key conjecture: that the distance between two 
successive knots of the MLE or LSE that are in the neighborhood of xq 
is Op(n -1 /( 2fc+1 )) as the sample size n — ► 00, and that distance between 
two successive points of touch between the (k — l)-fold integral of two-sided 
Brownian motion plus t 2k /{2k)\ and is O p (l). Both problems are of the 
same nature and one can go from the first to the second one via a simple 
scaling argument. We refer to this common problem as the gap problem. 

We will show in parts 3 and 4 that the gap problem can be reduced to 
the solution of a certain problem related to Hermite interpolation. That is, 
the gap problem has a solution if the following conjecture involving Hermite 
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interpolation is true: Consider Hermite interpolation (as described for exam- 
ple in Nurnberger (1989), pages 108-109 or DeVore and Lorentz (1993) 
pages 161 - 162) of some smooth function / via splines of odd-degree. More 
specifically, if / is some real- valued function in C^'[0, 1] for some j > 2, 
= yo < Ui < ■ ■ ■ < V2k-4 < 2/2fc-3 = 1 is a given increasing sequence, 
then the uniquely defined spline Hf of degree 2k — 1 and interior knots 
yi, ■ ■ • j V2k-4 satisfying the Ak — 4 conditions 

(Hf)( yi ) = f(y t ), and (H f)' (y t ) = f ( Vi ), i = 0, . . . , 2k - 3, 

then we conjecture that there exists a constant Ckj depending only on k and 
j such that, if j > k, 

SUp ||/-ff/||oo<C fcJ ||/ W ||oo, 

0<yi<-<2/ 2 fc-4<l 

where || • ||oo is the supremum norm over [0, 1]. 

This Hermite interpolation problem has apparently not been investigated 
in detail in the spline or approximation theory literature, and hence an 
analysis of the corresponding interpolation error is yet to be developed. It 
is, however, precisely the interpolation problem involved in understanding 
our least squares estimators, both for finite sample sizes and in the limiting 
Gaussian problem: as will be shown in parts 3 and 4, the connecting link is 
the classical theorem of Schoenberg and Whitney (1953) and its general- 
ization by Karlin and Ziegler (1966); see Nurnberger (1989), page 109, 
or DeVore and Lorentz (1993), page 162. 

However, the approximation theory literature has considered a related 
conjecture for another Hermite problem whose solution is a different odd- 
degree spline, also called a complete spline. Given a function / G C( fe_1 )[0, 1], 
and an increasing sequence = yo < y\ < ■ ■ ■ < y m < y m +i = 1, the com- 
plete spline interpolant, Cf, of degree 2k — 1 with interior knots y±, ■ ■ ■ , y m 
satisfies the 2k + m conditions 

{Cf)(yi) = f{yi), i = l,---,m 

(CffHyo) = f {l) (yo), (C/)( , )(y ro+1 ) = /(0(j /m+1 ), l = 0,...,k-l. 
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When / is in C&')[0,1] for j > k, the error in this more usual Hermite 
problem is known to be uniformly bounded independently of the location 
of the knots. Proof of this uniform boundedness is due to Shadrin (1992). 
More precisely, the argument follows from his Theorem 6.4, page 94. de Boor 
(1974) had investigated the problem for j = 2k, and conjectured uniformity 
of the bound for this particular case. Furthermore, de Boor (1974) reduced 
the problem to a further conjecture: for any k > 4, the supremum norm of 
the L2— spline projector that maps C^[0, 1] to the space of splines of degree 
k — 1 with knots y\, . . . , y m is bounded independently of the location of the 
knots. This conjecture remained unsolved for more than 25 years: Shadrin 
(2001) presents a proof thereof. Thus, there is a closely related interpolation 
problem in which the interpolation error does hold uniformly in the knots, 
and this gives some hope that "uniformity in the knots" will hold in our 
problem as well. 

In our Hermite interpolation problem, the spline interpolant matches not 
only the value of the function at the knots but also the value of its first deriva- 
tive. So intuitively, one should expect our spline to "behave better" than the 
complete spline, and the interpolation error to be smaller. On the other hand, 
our conjecture is supported by numerical evidence for k = 3,4, 5, 6. Our com- 
putations suggest that for these particular values, Ckj < l/((k — l)!(j — k)\). 
For further details see Balabdaoui and Wellner (2005b). 

2. The Maximum Likelihood and Least Squares estimators: Ex- 
istence and characterization. 

2.1. Mixture representation of a k-monotone density. Williamson (1956) 
gave the following characterization of a fc-monotone function on (0, 00): 

Theorem 2.1 (Williamson, 1956) A function g is k-monotone on (0, 00) 
if and only if there exists a nondecreasing function 7 bounded at such that 



Jo 
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where y + = yl (0 ,oo)(y)- 



The next theorem gives an inversion formula for the function 7: 

Theorem 2.2 (Williamson, 1956) If g is of the form (1) with 7(0) = 0, 
then at a continuity point t > 0, 7 is given by 

7 (f) ^ ("^"Wl/") 
3=0 j! 

For proofs of Theorems 2.2.1 and 2.2.2, see Williamson (1956). ■ 

From the characterization given in (1), we can easily derive another inte- 
gral representation for /c-monotone functions that are Lebesgue integrable 
on (0,oo); i.e., J °° g(x)dx < 00. 

Lemma 2.1 (Integrable k— monotone characterization) A function g is an 
integrable k-monotone function if and only if it is of the form 

(2) g(x) = I K t J + dF(t), x > 

where F is nondecreasing and bounded on (0, 00). Thus g is a k— monotone 
density if and only if it is of the form (2) for some distribution function F 
on (0, 00). 

Proof. This follows from Theorem 5 of Levy (1962) by taking k = n + 1 
and / = on (—00, 0]. ■ 

Lemma 2.2 (k-monotone inversion formula) If F in (2) satisfies lim^oo F{t) = 
Jo°° d( x )dx, then at a continuity point t > 0, F is given by 

(3)F(t) = G(t) - tg(t) + • • • + i=^t*-y*-2) (t ) + tll t k g (k-i) {t): 
where G(t) = j^g(x)dx. 
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Proof. By the mixture form in (2), we have for all t > 

\k /•oo 



F(oo) - F(t) = J x'dg^ix). 



But, for j = 1, ■ • • , k, t j G^(t) \ as t — ► oo. This follows from Lemma 1 in 
Williamson (1956) applied to the (k + l)-monotone function G(oo) — G(t). 
Therefore, for j = 1, • • • , k, t^g^~^(t) \ as t — > oo. 
Now, using integration by parts, we can write 



F(oo) - F(t) 



(-1) 
k\ 



x k g i - k - l \x) 



t + \k-iy. j t x 9 \ x ) dx 



(~ 1 ) k t k (k-l) (t) _ 1 fc-l (fc-2) m 

fc! 5 lj (fc-1)! 5 [) 

I -i \k— 2 /<oo 



tty*- 1 ^) - ^ijr**"V fc - 2) (x) + • • • - jTV)**. 



fc! 

Using the fact that F(oo) = J °° g(x)dx, the result follows. ■ 

For completeness and for comparison, we also give the corresponding char- 
acterization and inversion formula in the completely monotone case: 

Lemma 2.3 (Integrable completely monotone characterization) A function 
g is an integrable completely monotone function if and only if it is of the 
form 



(4) g(x) 



f°° i 

/ -exp(-x/t)dF(t), x>0 
Jo t 



where F is nondecreasing and bounded on (0, oo). Thus g is a completely 
monotone density if and only if it is of the form (4) for some distribution 
function F on (0, oo). 
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Lemma 2.4 (Completely-monotone inversion formula) If F in (4) satisfies 
limbec F(t) = f Q °° g(x)dx, then at a continuity point t > 0, F is given by 

(5) F(t) = lim Y^-T^(kt) j G {j) ( kt ) 

3=0 J 

where G(t) = j^g(x)dx. 

Proofs. Lemma 2.3 follows from the classical result of Bernstein; see Wid- 
der (1946), pages 141-163; Feller (1971), page 439; and Gneiting (1998). 
Lemma 2.4 follows from the development in Feller (1971), pages 232-233. 
For further details, see Balabdaoui and Wellner (2005a). ■ 

The characterization in (2) is more relevant for us since we are dealing 
with /c-monotone densities. It is easy to see that if g is a density, and F is 
chosen to be right-continuous and to satisfy the condition of Lemma 2.2, 
then F is a distribution function. For k = 1 {k = 2), note that the charac- 
terization matches with the well known fact that a density is nondecreasing 
(nondecreasing and convex) on (0, oo) if and only if it is a mixture of uni- 
form densities (triangular densities). More generally, the characterization 
establishes a one-to-one correspondance between the class of /c-monotone 
densities and the class of scale mixture of Beta's with parameters 1 and k. 
From the inversion formula in (3), one can see that a natural estimator for 
the mixing distribution F is obtained by plugging in an estimator for the 
density g and it becomes clear that the rate of convergence of estimators of 
F will be controlled by the corresponding rate of convergence for estimators 
of the highest derivative g^" 1 -* of g. When k increases the densities be- 
come smoother, and therefore the inverse problem of estimating the mixing 
distribution F becomes harder. 

In the next section, we consider the nonparametric Maximum Likelihood 
and Least Squares Estimators of a /c-monotone density go. We show that 
these estimators exist and give characterizations thereof. In the following, 
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Mk is the class of all k- monotone functions on (0, oo), T>k is the sub-class 
of fc-monotone densities on (0, oo), X\, ■ ■ ■ , X n are i.i.d. from go, and G n is 
their empirical distribution function, G n (x) = n~ l J2\ < x} for x > 

2.2. Maximum likelihood estimation of a k-monotone density. Let 

/•oo 

ln{g) = / logg(x) dG n (x) 
Jo 

be the log-likelihood function (really n _1 times the log-likelihood function, 
but we will abuse notation slightly in this same way throughout). We want 
to maximize l n (g) over g 6 T>^. To do this, it is frequently of help to change 
the optimization problem to one over the whole cone Mk H £i(A). This can 
be done by introducing the "adjusted likelihood function" ij) n (g) defined as 
follows: 

/•oo roo 

ipn{g) = \ log g(x) dG n (x) - / g(x)dx, 
Jo Jo 

for g € M k n Li(A). Then, as in GJW (2001a), Lemma 2.3, page 1661, the 
maximum likelihood estimator g n also maximizes tpn(g) over Mk H Li(A) 

Using the integral representations established in the previous subsection, 
ip n can also be rewritten as 



Jo°° log (/o 00 kjL w^dF{t)} dG n {x) - /~ /~ k -^^dF(t)dx, 
j °° log (/ °° \ exp(-x/t)dF(t)) dG n (x) 
~ Jo°° fo° 7 eM-x/t)dF(t)dx, 



where F is bounded and nondecreasing. 

Lemma 2.5 The maximum likelihood estimator g n ^k in the classes T>k, k £ 
{1, 2, . . . , oo} exists. Furthermore, g nt k is the maximizer of ip n over Mk H 
Li(A). Moreover, for k £ {1,2,.. .} the density g n k is of the form 

, , , „ Hoi - xf+ l A k{a m - x) 1 ^ 1 

g n ,k{x) = wi ^ 1 h w m , 

a l a m 
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for some m = rhk, while for k = oo, g n ,oo is of the form 

g n ,oo{x) = ^-exp(-x/ai) H h ^exp(-x/a m ) 

for some m = TOqo where w\ , ■ ■ ■ , w m and - ■ ■ ,a m are respectively the 
weights and the support points of the maximizing mixing distribution F n ^. 

Proof. First, we prove that there exists a density g n that maximizes the 
"usual" log- likelihood l n = J °° log g(x)dG n (x) over the class T>k with k 
finite. For g in V^, let F be the distribution function such that 



The unicomponent likelihood curve T as defined by Lindsay (1983a) (see 
also Lindsay (1995)) is then 



It is easy to see that T is bounded (notice that the i-th component is equal 
to whenever y < Xj). Also, V is closed. By Theorems 18 and 22 of Lindsay 
(1995), there exists a unique maximizer of l n and the maximum is achieved 
by a discrete distribution function that has at most n support points. 

Now, let g be a /c-monotone function in Mk<~)Li(\) and let / °° g(x)dx = c 
so that g/c G We have 



since log(c) < c — 1. Thus ip n is maximized over Mk H L\{X) by g n G V^. 
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In the case k = oo, the assertions of the lemma are proved by Jewell 
(1982). ■ 

The following lemma gives a necessary and sufficient condition for a point 
t to be in the support of the maximizing distribution function F n ^. For 
k e {3, . . .} it generalizes lemma 2.4, page 1662, Groeneboom, Jongbloed, 
and Wellner (2001b). 

Lemma 2.6 Let X±, ■ ■ ■ , X n be i.i.d. random variables from the true density 
go, and let F n ^ and g n ^ be the MLE of the mixing and mixed distribution 
respectively. Then, for k £ {1,2,...}, 

(a, A M(() = G .(Miz2^) sl , 

with equality if and only ift£ supp(F n ^) = • • • , a m }. In the case k = oo 

(7) H nt0O (t) = G n f ^^fyf l < 1. f° r al1 * > 

V tg nt00 {Ji) j 

with equality if and only ift£ supp(F ntOC ) = {di, • • • , d m }. 

Remark 2.1 By factoring out t k ~ 1 and replacing t by kv (say), it becomes 
clear that the function H n ,oc on the right side of (7) is a natural limiting 
version as k — > oo of the functions H n ,k on the right side of (6). 

Proof. Since F n maximizes the log-likelihood 

it follows that for all t > 

lim U(l-e)F n + e5 t )-l n {F n ) < Q 
e\0 e 
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This yields 

1 " k{t - X^j- 1 /t k - g n (X 3 ) <Q 

or 

Now, let M n be the set defined by 



\ nj^ g n (Xj) J 

We will prove now that M n = supp(F ra ). We write Pp for the probability 
measure associated with F n . Integrating the left hand side of (8) with respect 
to F n , we have 

^ ^{kjt-x^/t^jt) ^ 

But, using the definition of M n , we can write, 



-E 



U 9n(Xj] 



1 n (k{t-X 3 ) k - l /t^ 



-dF n (t), 



and so 

Pp{R + \M n ) = / -VA 

< P Pn (R + \M n ), iiP Pn (R+\M n )>0. 

This is a contradiction and we conclude that Pp (M + \ M n ) = 0. 

The proof of the result for k = oo is given by Jewell (1982), page 481. 
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2.3. The Least Squares estimator of a k-monotone density. The least 
squares criterion is 

yoo yoo 

(9) Qn{g) = ^ g 2 {x)dx- g(x)dG n (x). 

z Jo Jo 

We want to minimize this over g £ PfcnL2(A), the subset of square integrable 
A;— monotone functions. Although existence of a minimizer of Q n over T> k n 
L2(A) is quite easily established, the minimizer has a somewhat complicated 
characterization due to the density constraint J °° g(x)dx = 1. Therefore we 
will actually consider the alternative optimization problem of minimizing 
Qn(g) over A^fe H L2(A). In this optimization problem existence requires 
more work, but the resulting characterization of the estimator is considerably 
simpler. Further we will show that even though the resulting estimator does 
not necessarily have total mass one, it does have total mass converging 
almost surely to one and it consistently estimates go £T> k . 

Using arguments similar to those in the proof of Theorem 1 in Williamson 
(1956), one can show that g G Mk if and only if 

/•OO 

g(x) = (t- xf + ~ l d^{t) 



for a positive measure [i on (0, oo). Thus we can rewrite the criterion in 
terms of the corresponding measures [i: by Fubini's theorem 



L 



where 



/*oo poo poo 

I g 2 (x)dx = / / r k (t,t')dti(t)d(i(t') 
o Jo Jo 



rtAt' 

r k (t,t')= j {t-x) k + l {t' -x) k + ~ 1 dx = J (t-x) k - 1 (t / -xf^dx, 



and 



/•OO /'OO /'OO ^00 

/ g(x)dG n (x) = / / (t-x) k + - 1 dfx(t)dG n (x)= / s n>k (t)dfx(t) 
Jo Jo Jo Jo 

where 

Sn,k(t) = &n((t ~ Xf + ' 1 ) . 
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Hence it follows that, with g = 

^00 roc roo 

Qn{g) = i, / r k (t,t')dv(tW(t') - s njk {t)dn{t) = ^ n {n) 
z Jo Jo Jo 

Now we want to minimize <£ n over the set X of all non-negative measures 
fi on R + . Since $ n is convex and can be restricted to a subset C of X on 
which it is lower semicontinuous, a solution exists and is unique. 

Proposition 2.1 The -problem of minimizing <& n (A0 over all non-negative 
measures fi has a unique solution jl. 

Proof. Existence follows from Zeidler (1985), Theorem 38. B, page 152. 
Here we verify the hypotheses of that theorem. 

We identity X of Zeidler's theorem with the space X of nonnegative mea- 
sures on [0,oo), and we show that we can take M of Zeidler's theorem to 
be 

C = {fi G X : n(t, 00) < Dt-^-W} 

for some constant D < 00. 

First, we can, without loss, restrict the minimization to the space of non- 
negative measures on [A^-^oo) where > is the first order statistic 
of the data. To see this, note that we can decompose any measure fi as 
/j, = fii + (I2 where /Ui is concentrated on [0, X^ ) and ^2 is concentrated 
on [X^,oo). Since the second term of 3> ra is zero for //i, the contribution 
of the fii component to $ n (/-0 is always non- negative, so we make inf $ n (/x) 
no larger by restricting to measures on [-X"(i), 00). 

We can restrict further to measures /x with J °° t k ~ 1 d/j,(t) < D for some 
finite D = D w . To show this, we first give a lower bound for rk(s,t). 

For s, t > to > we have 

(10) r k (s,t)> G-^pjo a *-if*-i 
where vq ~ 1.59. To prove (10) we will use the inequality 

(11) {l-v/kf- 1 > e~ v , 0<v<v , k>2. 
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(This inequality holds by straightforward computation; see Hall and Well- 
ner (1979), especially their Proposition 2.) Thus we compute 



/•CO 

r k (s,t) = / (s-x) k + ~ l (t-x) k + ~ l dx 
Jo 

/•oo 

= s k -H k - 1 I {l-x/sf^il-x/tf-^x 



^/ ^-fk) + l 1 -^, dy 



since 



1 pVO(t/\S) 

> S k -H k - 1 / e-y/'e-y/'dy 
k Jo 

i rv (t/\s) 

= a fc-i t fc-i / e~ cy dy, c=l/s + l/t 

k Jo 

1 1 pvo(tAs) 

k c J 

= U k ~ 1 t k ~ 1 - {1 - eM~c{t As)»o)) 

> l/-i^-iI(i_exp(-« )) 

c 



x x I (t + s)/t, S<t 1 

c(sAt) = (sAt)= V - }>1. 



But we also have 



st I (t + s )/s, s > t 



St 1 1 

> -s At > -t Q 



c (1/s) + (1/t) s + t - 2 ~ 2 

for s,i > so we conclude that (10) holds. 

From the inequality (10) we conclude that for measures \i concentrated 
on P^(i), oo) we have 

J f r k (s,t)dv(sW(t) > (1 ~ 6 J )X(1) ^t*-^*)) 2 • 
On the other hand, 



oo 





/•CO 

, k (t)Mt) < / t k ~ l dn{t). 

Jo 
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Combining these two inequalities it follows that for any measure \i concen- 
trated on [Xfu,oo) we have 

$„(//) = i J J ' r k (t, s)dfj,(t)dfi(s) - J s njk (t)dfj,(t) 

This lower bound is strictly positive if 
m fc _i > l/A = 



(1 - e-«°)X (1) 

But for such measures \i we can make $ smaller by taking the zero measure. 
Thus we may restrict the minimization problem to the collection of measures 
H satisfying 

(12) m fc _i<l/A. 

Now we decompose any measure [i on \Xi\\,oo) as fi = [i\ + [i 2 where [i\ 
is concentrated on [X^, MX^] and \i 2 is concentrated on (MX^,oo) for 
some (large) M > 0. Then it follows that 

*n(M) > ^JJr k (t,s)d t x 2 (t)d f i 2 (s)-^ t^d^t) 

> (1 " e l MXW (^(»)) 2fc 'V(MI (n) , oo) 2 - l/A 



if 



= B^{MX {n) , oo) 2 - l/A > 



M (MX (n) ,oo) 2 > 



AS (1 - e~vo)X {1) (1 - e-«o)(MX (n) )2*-i 

and hence we can restrict to measures fi with 

4k 1 

f ' <MX( " ),CO) £ (l- e -»)^^-'^M-V 2 

for every M > 1. But this implies that /U satisfies 

1 

t k ~ 3 / 4 dfi{t) < D 



J 

Jo 



10 
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for some < D = D w < oo, and this implies that t k ~ l is uniformly integrable 
over jj, € C. Alternatively, for A > 1 we have 

p poo 

/ t k ^dfi(t) = A fe "V(A,oo) + (A:-l) / s k - 2 n{s,oo)ds 
Jt>\ J\ 

< X k-l_K_ + {k _ l) s k-2 Ks - { k-l/2) ds 

poo 

= KX- 1 ' 2 + (k- \)K / s- ? >' 2 ds 

< KX- 1 / 2 + {k- l)2i^A- 1 / 2 
— > as A — > oo 

uniformly in jjl £ C. 

This implies that for {/ti m } C C satisfying /x m =>■ /i we have 

/•oo poo 

limsup / s n:k (t)dn m {t) < \ s njk (t)dfjio(t) , 

JO JO 
and hence $ is lower-semicontinuous on C: 

liminf $ n (/x m ) > $(/z ) • 

m— >oo 

Since <J> n is lower semi-compact (i.e. the sets C r = {fi £ C : & n (p) < J"} are 
compact for r 6 R), the existence of a minimum follows from Zeidler (1985), 
Theorem 38. B, page 152. Uniqueness follows from the strict convexity of $> n . 



The following proposition characterizes the least squares estimators. 

Proposition 2.2 For k 6 {1,2,.. .} define Y n ^ and # ni fc respectively by 
Y„ )fc (t) = / [ k 1 ■■■ [ 2 G n (ti)dtidt 2 ■ ■ ■ dt fc _i, x > 0, 

JO JO JO 



and 

#n,fc(*) = / / • • • / 9n(ti)dtidt 2 ■ ■ ■ dt k , x>0. 

JO JO JO 
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Then g n ^ is the LS estimator over Mk n L 2 (A) if and only if the following 
conditions are satisfied for g n< k and H n ,k : 



(13) 



H ni k(t) > Y njk (t), for t > 0, and 
H n ,k{t) = ^n,k{t), for t e supp{F njk } . 



Remark 2.2 Note that for G {1,2,.. .} the processes Y H) k and H n ^ can 
be written in the more compact forms 

ft (+ „\k-l 



Yn ' k{t) = I { \k -1)! dGn(x) 



and 

\k-l 



~ . , r(t-x) k - 1 
Hn > k{t) = J (k-iy. 9n 



[x)dx. 



Proof. Let g n G -M^n L 2 (X) satisfy (13), and let g be an arbitrary function 
in M k n L 2 (A). Then 

Qn(g)-Qn(g n ) = ^ J g 2 (x)dx - i J g 2 n {x)dx 

- j g(x)dG n (x) + J g n {x)dG n {x). 

Now, using integration by parts 

poo 

/ (g(x) - g n (x))dG n (x) 
Jo 

poo 

= - G n (x)(g'(x) - g' n (x))dx 
Jo 

pOO / px \ 

= Jo [Jo G ^ d y)^"( x )-~9n^))dx 



poo 

= {-if / Y n (x)(dgV°- 1 \x)-dgV°- 1 > 
Jo 



(*)), 
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and 

poo 

/ {g 2 {x) -<fn{x))dx 
JO 

poo 

= / (9(x) +9n{x)){g{x) - g n {x))dx 
Jo 

poo / px px \ 

= -J I J g(y)dy + g n {y)dy\{g'{x) - g' n {x))dx 

POO 

= (-l) k (G k (x) + H n (x))(dg^- 1 Hx)-dgi h - 1 Hx)), 
Jo 

where G k is the fc-th order integral of g. Hence, 
1 f 00 

Qn(g)-Qn{g n ) = ^(-l)* J (G k (x) + H n (x)){dg( k -» {x) - d$-V (x)) 

POO 

- (-l) k / Y„(x)(^( fe - 1 )(x) - dg[ k ~ l \x)) 
Jo 

1 

= 2 { ~ 1)k J (Gk(x)-H n (x)W k - 1 Hx)-dgt 1) (x)) 

POO 

+ (-l) k / (H n (x) - Y„(x))(^( fc - 1 )(x) - dg^(x)) 
Jo 

POO 

> {-If (H n {x)-Y n (x)){dg^){x)-dt~ l) (x)). 
Jo 

To see that, we notice (using integration by parts) that 

POO POO 

{-l) k (G k {x)-H n {x))(dg^(x)-d~g^ k - l) (x))= (g(x) - ~g n {x)?dx. 
Jo Jo 

But condition (13) implies that 

POO 

/ (H n (x)-Y n (x))dg^- 1 \x) = 0. 
Jo 

Therefore, 

POO 

Qn(g) - Qn(g n ) > / (H n (x) - Y^X-i)*^*" 1 )^) > 0, 
Jo 

since H n > Y n and (-l)*- 2 ^*- 1 )^) = (-l)*^*- 1 )^) > because 
(— 2 ^( fc_ 2 ) is convex. 
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Conversely, take gt £ Mk to be 

9t{x) = { \k-l)\ ' 

We have: 

Using integration by parts, we obtain 

Q < Hm Qn(9n + eg t )-Q n (g n ) = & _ 

e^O e 

Finally, since g n maximizes Q n it follows that 

_ lim Q„((i + e)j„)-o„e„) = r- gl(x)dx . r Sn(x)dGn(x) 

e Jo Jo 

COO 

(H n (x) -Y n (x))(-l) fc - 1 ^i fc - 1) W, 



/o 

which holds if and only if the equality in (13) holds. ■ 

In order to prove that the LSE is a spline of degree k — 1, we need the 
following result. 

Lemma 2.7 Let [a,b] C (0, oo) and Zef g be a nonnegative and nonincreas- 
ing function on [a,b]. For any polynomial Pk-i of degree < k — 1 on [a,b], 
if the function 

A(t) = At - a)*"^^)^ - flfc-i(s), t G [a, 5] 

JO 

admits infinitely many zeros in [a,b], then there exists to £ suc/t t/iat 
g = on [to, b] and g > on [a, to) if to > «• 

Proof. By applying the mean value theorem k times, it follows that (k — 
l)\g = admits infinitely many zeros in [a,b\. But since g is assumed 
to be nonnegative and nonincreasing, this implies that if to is the smallest 
zero of g in [a, b], then g = on [to, b\. By definition of to, g > on [a, to) if 
to > a. ■ 
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Remark 2.3 In the previous lemma, the assumption that A has infinitely 
many zeros can be weakened. Indeed, we obtain the same conclusion if we 
assume that A has k + 1 distinct zeros in [a, b] . 

Now, we will use the characterization of the LSE g n together with the 
previous lemma to show that it is a finite mixture of Beta(l, fc)'s. We know 
from Proposition 13 that g n is the LSE if and only if 

(14) H n (t) > Y n (t), for t > 0, 
and 

(15) (H n {t) - Y n (t)) dg^it) = 
where 

and 

Y -w = jfl^i5r* s * (t) - 

The condition in (15) implies that H n and Y n have to be equal at any point 
of increase of the monotone function (— l^^gli ^ . Therefore, the set of 
points of increase of (— l) k ^ 1 g^ : ^ is included in the set of zeros of the 
function A n = H n — Y n . Now, note that Y n can be given by the explicit 
expression: 

Y «(*) = 7jfc^^D*-^))+" 1 . fo ^>°- 
j=i 

In other words, Y ra is a spline of degree k—1 with simple knots , • • • , 
(for a definition of the multiplicity of knots, see e.g. de Boor (1978), page 
96, or DeVore and Lorentz (1993), page 140). Also note that the function 
(— l) fc ~ 1 £ji fc ~ 1 ' ) cannot have a positive density with respect to Lebesgue mea- 
sure A. Indeed, if we assume otherwise, then we can find < j < n and an 
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interval / C (X^,Xq +1 ^) (with X( ) = and -X"( n +i) = oo) such that I has 

~ (k) (k) 

a nonempty interior, and H n = Y n on /. This implies that Hn ' = Y„ = 0, 
since Y ra is a polynomial of degree k — 1 on /, and hence g n = on I. But 
the latter is impossible since it was assumed that ( K —l) h ^ 1 gn ^ was strictly 
increasing on /. Thus the monotone function (—l) k ~ 1 g n k ^ can have only 
two components: discrete and singular. In the following theorem, we will 
prove that it is actually discrete with finitely many points of jump. 

Proposition 2.3 There exists m £ N\{0} ; a\, ■ ■ ■ , a m and ib\, ■ ■ ■ , w m such 
that for all x > 0, the LSE g n is given by 

, 1R x ~ f , ~ ~ , ^ ~ Ham - x)%- 1 

(16) g n (x) = wi ^ 1 h w m • 

a 1 a m 

Proof. We need to consider two cases: 

(i) The number of zeros of A n = H n — Y n is finite. This implies by (15) that 
the number of points of increase of (—l) k ^ 1 g^ ^ is also finite. Therefore, 
(—l) k ^ 1 gri ^ is discrete with finitely many jumps and hence g n is of the 
form given in (16). 

(ii) Now, suppose that A n has infinitely many zeros. Let j be the smallest 
integer in {0, ■■■,n— 1} such that [Xtj\,X/j +1 \] contains infinitely many 
zeros of A n (with X( ) = and X( n+1 ) = oo). By Lemma 2.7, if tj is the 
smallest zero of g n in [Xrj\, -XVj+i)], then g n = on [tj, Xq +1 \] and g n > 
on LYjj),^) if tj > X(jy Note that from the proof of Proposition 2.1, we 
know that the minimizing measure fi n does not put any mass on (0,X^], 
and hence the integer j has to be strictly greater than 0. 

Now, by definition of j, A n has finitely many zeros to the left of X^, 
which implies that (— l) fe_1 ^f ^ has finitely many points of increase in 
(0,X(j\). We also know that g n = on [tj,oo). Thus we only need to show 
that the number of points of increase of (— \) k ~ l gh ^ in [X^,tj) is finite, 
when tj > Xuy This can be argued as follows: Consider Zj to be the smallest 
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zero of A n in [Xtj\,X/j +1 \). If Zj > tj, then we cannot possibly have any 
point of increase of (— l)* -1 ^ ^ in [X^,tj) because it would imply that 
we have a zero of A n that is strictly smaller than Zj . If Zj < tj , then for the 
same reason, (— l) k ~ 1 g^ ^ has no point of increase in [X^,Zj). Finally, 
(—l) k ^ 1 gn ^ cannot have infinitely many points of increase in [zj,tj) be- 
cause that would imply that A n has infinitely zeros in (zj,tj), and hence by 
Lemma 2.7, we can find tj G {zj,tj) such that g n = on [tptj]. But this 
impossible since g n > on [X^,tj). ■ 

Remark 2.4 We have not succeeded in extending Proposition 2.1 to the 
case k = oo. It is possible to prove the existence of a least squares estimator 
if the maximization is carried over over V 00 nL2(X) rather than M- 00 riL2( K X), 
but this does not seem (to us) to be the right direction to proceed. 

3. Consistency. In this section, we will prove that both the MLE and 
LSE are strongly consistent. Furthermore, we will show that this consistency 
is uniform on intervals of the form [c, oo), where c > 0. 

3.1. Consistency of the maximum likelihood estimator. Consistency of 
the maximum likelihood estimators for the classes in the sense of Hellinger 
convergence of the mixed density is a relatively simple straightforward con- 
sequence of the methods of Pfanzagl (1988), van de Geer (1993), and van 
de Geer (1996). As usual, the Hellinger distance H is given by H 2 (p,q) = 
(1/2) J {^/p — ^/q) 2 d/j, for any common dominating measure /U. 

Proposition 3.1 Suppose that g n ^ is the MLE of go in the class V^, k £ 
{1, . . . , oo}. Then 

H(g nt k,9o) ~^a.s. as oo. 

Furthermore F n ^ — ^ Fq almost surely where F n ^ is the MLE of the mixing 
distribution function Fq. 
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Proof. This follows from the methods of Pfanzagl (1988), van de Geer 
(1993), and van de Geer (1996), by using the Glivenko-Cantelli preservation 
theorems of van der Vaart and Wellner (2000). See also van de Geer 
(1999), page 54, example 4.2.4, and Wellner (2003b), pages 98 to 99. 



The following lemma establishes a useful bound for /c-monotone densities. 



Lemma 3.1 If g is a k-monotone density function for k>2, then 

i / r*- 1 



for all x > 0. 
Proof. We have 

poo u i poo u 

g{x) = / ^ k {y-x)^dF{y)= l - ^(1 - -) k ' l dF(y) 
Jx V x J x y y 

1 kx ( x \ k , \h— i 

< - sup — 1 = - sup u(l - u) 

X x<y<oo V V V J x 0<u<l 

1 / _ 1 X ^ 

x\ k / 

since, with gk(u) = u(l — n) fe_1 we have 

g ' k {u) = (1 - uf- 1 - u{k - 1)(1 - uf~ 2 = (1 - u) k " 2 {\ - ku) 

which equals zero if u = 1/k and this yields a maximum. (Note that when 
k = 2, this bound equals l/(2x) which agrees with the bound given by 
Jongbloed (1995), page 117 in this case.) ■ 

Proposition 3.2 Let go be a k-monotone density on (0, oo) and fix c > 0. 

Then 

sup \g n (x) - go(x)\ -> a . s . 0, as oo. 

x>c 
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Proof. Let Fq be the mixing distribution function associated with go. Then 
for all x > 0, we have 



9o (x) 



kit - xf- 1 

>+ -dF (t). 



Now, let Yi, ■ ■ ■ , Y m be i.i.d. from Fq. Taking m = n, let F„ be the corre- 
sponding empirical distribution and g n the mixed density 



9n{x) 



k(t - x f,- 1 , , 

tk J+ d¥ n (t), x>0. 



Let d > 0. Using integration by parts, we have for all x > d 



\9n{x) - gQ(x)\ 

{t-x) k ~ l 



I 

J X 



k- 



t k 



d(¥ n -F )(t) 



00 , (k - l)t k {t - x) k ~ 2 - kt k ~ l {t - x] 



k-1 



t 2k 



-(F n - F )(t)dt 



< 



< 



k 



t k 

(t - df- 2 



t k+i 



dt) ||F n -F | 



< 2k 



t k 

(t - d) k ~ 2 
d t k 
Cd||F n — -Folloo- 



d 

dt ) ||F„ — FqIIoo 



dt ||F„ — FqIIoo 



By the Glivenko-Cantelli theorem, the sequence of fc-monotone densities 
(g n ) n satisfies 

sup \g n {x) - gv{x)\ -> . s . 0, as n -> oo. 

x(z[d,oo) 

Since the MLE g n maximizes the criterion function over the class A^fcflLi(A), 
we have 

lim - (4> n ((l - e)g n + eg n ) - ^ n {g n )) < 0, 
e\0 e 



and this is equivalent to 
(1) 



r iMdG n (x) < i. 

JO 9n{X) 
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Let F n denote again the MLE of the mixing distribution. By the Helly-Bray 
theorem, there exists a subsequence {F/} that converges weakly to some 
distribution function F and hence for all x > 



gi(x) -> g(x), as I -> oo, 



where 



/•oo ^ _ r N ) fc - 1 

$(*) = jf fc l t > + dF{t), x > 0. 

The previous convergence is uniform on intervals of the form [d, oo), d > 0. 
This follows since <?/ and g are monotone and g is continuous. 

Much of the following is along the lines of Jongbloed (1995), pages 
117-119, and Groeneboom, Jongbloed, and Wellner (2001b), pages 1674- 
1675. We are going to show that g and the true density go have to be the 
same. For < a < 1 define % = Gq 1 (1 — a). Fix e so small that e < r/ e . By 
(1) there is a number D e > such that gi(rj e ) > -D e f° r sufficiently large Z. 
To see this, note that (1) implies that 

1 > ¥f\dG t (x) > r ^rdG^x) > — U- f°° g^dG^x) , 

JO 9l(x) Jr,e 91{X) gi{Ve) J Ve 

and hence 

/•oo roo 

liminf gi(rj £ ) > liminf / g t (x)dGi(x) = / g (x)dG (x) > 0, 

1 1 Jf] e Jr] e 

by the choice of rj e and hence we can certainly take D e = go(x)dGo(x) /2. 
Hence, by continuity of gi and the bound in Lemma 3.1 

ftMsi<i-i)'- = 2, 9 ,(,)<i(i-i,-^, 

g//<7; is uniformly bounded on the interval [e,?7e]- That is, there exist two 
constants c t and c e such that for all x £ [e, 7] e ] 

Ql(x) 
9l(x) 
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In fact, 



while 



using the (uniform) convergence of gi to go- Therefore 

9i(x) _^ go(x) 
m{x) g{x) 

uniformly on [e, r) e ]. For sufficiently large I, we have using (1) 
rrh 9o(x).„ , x . f Ve f9l(x) 



i: 



-dG,(z) < (¥^r + dG,(x) < 1 + e. 
Je \9l{x) ) 



9(x) 

But since Gi converges weakly to Go the distribution function of go and go/g 
is continuous and bounded on [e,r) e ], we conclude that 



r e 9o{x) 

/ —r T" dGo{x) < 1 + e. 
.A- 0(aO 



Now, by Lebesgue's monotone convergence theorem, we conclude that 

9o(x) 



f 

Jo 



lo 9{x) 
which is equivalent to 



■dG (x) < 1, 



(2 ) rfw 

7o 9{x) 

Define r = J °° g(x)dx. Then /i = r -1 ^ is a /c-monotone density. By (2), we 
have that 

Jo h(x) Jo 9{x) 

Now consider the function 

foo „2 



Jo g(x) 
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defined on the class Q of all continuous densities g on [0, oo). Minimizing 
K is equivalent to minimizing 

It is easy to see that the integrand is minimized pointwise by taking g(x) = 
go(x). Hence infc d K{g) > 1. In particular, K(h) > 1 which implies that 
r = 1. Now, if g ^ go at a point x, it follows that 5 7^ 50 on an interval 
of positive length. Hence, go 7^ 9 K(g) > 1. We conclude that we have 
necessarily h = g = go- 

We have proved that from each subsequence of g n , we can extract a further 
subsequence that converges to go almost surely. The convergence is again 
uniform on intervals of the form [c, 00), c > by monotonicity of g n and g 
and continuity of go . ■ 

Corollary 3.1 Let c> 0. For j = 1, • • • , k - 2, 

sup \g^\x) - gQ ] {x)\ -> a . a . 0, as 00, 

and for each x > at which go is k — 1-times differentiable, 

Proof. This follows along the lines of the proof in Jongbloed (1995), page 
119, and Groeneboom, Jongbloed, and Wellner (2001b), Lemma 3.1, 
page 1675. ■ 

3.2. The Least Squares estimator. We also have strong and uniform con- 
sistency of the LSE g on intervals of the form [c, 00), c > 0. 
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Proposition 3.3 Fix c > and suppose that the true k-monotone density 
go satisfies J °° x~ 1 ^ 2 dGo(x) < oo. Then \\g n — go\\2 —>a.s. 0, and 

sup \g n (x) - go(x)\ -> a . s . 0, as n ^ oo. 

x>c 



Proof. The main difficulty here is that we don't know whether the LSE 
g n is a genuine density; i.e. g n £ Mk but not necessarily g n £ T> k . Once 
we show that g n stays bounded in L2 with high probability, the proof of 
consistency will be much like the one used for k = 2; i.e., consistency of the 
LSE of a convex and decreasing density (see Groeneboom, Jongbloed, and 
Wellner (2001b)). The proof for k = 2 is based on the very important fact 
that the LSE is a density, which helps in showing that g n at the last jump 
point r n 6 [0, 5] of g' n for a fixed 5 > is uniformly bounded. The proof 
would have been similar if we only knew that 



Jo 



g n (x)dx = O p (l) . 

10 

Here we will first show that J °° g^dX = O(l) almost surely. From the last 
display in the proof of Proposition 2.2 



/'OO poo 

/ 9n{x)dx = / g n (x)dG n ( 
Jo Jo 



x) 



and hence 



( ? >) \l J 9l(x)dx = J u n (x)dG n (x), 

where u n = g n /\\gn\\2 satisfies ||u n ||2 = 1- Take JF fc to be the class of functions 

F k = {geM k ,J^ 5 2 dA = l|. 
In the following, we show that T k has an envelope G G Li{Gq). 
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Note that for g G Tk we have 

1=1 g 2 d\ > / g 2 d\ > xg 2 (x) , 
Jo Jo 



since g is decreasing. Therefore 

g(x) <^ = G(x) 

yJX 

for all x > and g G Tk\ i.e. 67 is an envelope for the class T^. Since 
G G Li(G7o) (by our hypothesis) it follows from the strong law that 



,00 ^oo /*oo 

/ u n (x)dG n (x) < / 67(x)dG n (x) -> a . s . / G(x)dG (x), 
Jo Jo Jo 



as n — > oo 



and hence by (3) the integral J °° g^dA is bounded (almost surely) by some 
constant M^. 

Now we are ready to complete the proof. Most of the following arguments 
are similar to those of proof of consistency of the LSE when k = 2 as given 
in Groeneboom, Jongbloed, and Wellner (2001b). 

Let 5 > and r n be the last jump point of g K n if there are jump points 
in the interval (0, 6], otherwise we take r n to be 0. To show that the sequence 
(dn(Tn)) n stays bounded, we consider two cases: 

1- T n > 5/2. Let n be large enough so that g\d\ < M k . We have 

,5/2 

9n(r n ) < 9n(S/2) < (2/5)(5/2)~g n (5/2) < (2/5) / g n (x)dx 

Jo 

< (2/5)7V2Wy o ~g 2 n (x)dx<VVSjj o 9l(x)dx 
(4) = ^2M^~5. 

2. r n < 5/2. We have 



/ g n (x)dx < \f8-T n \l I gl(x)dx 

< 9l{x)dx = ^5M k . 
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Using the fact that g n is a polynomial of degree k — 1 on the interval 
[r n , 8} we have 
rS 



y/SMk > / g n (x)dx 



> {5-T n )(g n {8) + ^{-l)~g' n {5){8-T n ) 

+ ■■■ + (-l) k - l9 -f j ^{t-Tn) k - 1 
= ~ T n ) (§n{8) ^1 - + TdniTn, 

- 2k 9n ^ Tn ' 



k\ 



k fc* 



and hence g n (T n ) < 2k^jM k j8. Therefore, combining the bounds, we have 
for large n 



(5) 5n(r„) < 2k v / M k /8 = C k . 

Now, since <7 n (<5) < g n {T n ), the sequence <? n (x) is uniformly bounded almost 

surely for all x > <5. Using a Cantor diagonalization argument, we can find 
a subsequence {n/} so that, for each x > 5, g ni (x) — ► g( x )> as Z — > oo. By 
Fatou's lemma, we have 

/'OO /'OO 

(6) / (#(x) -g (x)) 2 dx < liminf / (y nj (x) - g (x)) 2 dx. 
Js Js 

On the other hand, the characterization of g n implies that Q n (g n ) < Qn(<?o)> 

and this yields 

1 /»0O 

(<7n(z) - 9o(x)) 2 dx < 2 / (g„(ar) - g (x))d(G n (x) - G Q (x)) . 

Jo 

Thus we can write 

OO POO 

(9ni(x) - g (x)) 2 dx < / (g ni (x) - g (x)) 2 dx 
s Jo 

/■oo 

(7) < 2 / (g n[ (x) - g (x))d(G n[ (x) - G (x)) 0, 

JO 
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as I — ► oo. The last convergence is justified as follows: since J °° g^d\ is 
bounded almost surely, we can find a constant C > such that g ni —go admits 
G(x) = C/y/x,x > 0, as an envelope. Since G 6 L\(Go) by hypothesis and 
since the class of functions {(<? — 9 , o)1[g<m] : <7 £ .M^nl^A)} is a Glivenko- 
Cantelli class for every M > (each element is a difference of two bounded 
monotone functions) (7) holds. From (6), we conclude that 

POO 

J s (g(x)-g (x)) 2 dx<0, 

and therefore, g = go on (0, oo) since 5 > can be chosen arbitrarily small. 
We have proved that there exists £lo with P(f2o) = 1 an d such that for each 
uj G r^o an d any given subsequence g nk (-,uj), we can extract a further subse- 
quence g ni (-,uj) that converges to go on (0, oo). It follows that g n converges 
to go on (0,oo), and this convergence is uniform on intervals of the form 
[c, oo), c > by the monotonicity and continuity of go- * 

Corollary 3.2 Let c > 0. Under the assumption of Proposition 3.3, we have 
for j = 1,- • • ,k - 2, 

sup \g% \x) - g^ J \x)\ -> a . s . 0, as n -> oo, 

i£[c,oo) 

and for each x > at which go is k — 1-times differentiable, 
~g^\x) - a .,gr i} (x). 

Proof. See the proof of Corollary 3.1. ■ 

4. Asymptotic Minimax risk lower bounds for the rates of con- 
vergence. In this section our goal is to derive minimax lower bounds for 
the behavior of any estimator of a k— monotone density g and its first k — 1 
derivatives at a point xo for which the k— th derivative exists and is non- 
zero. The proof will rely upon the basic Lemma 4.1 of Groeneboom (1996); 
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see also Jongbloed (2000). This basic method seems to go back to Donoho 
and Liu (1987) and Donoho and Liu (1991)). The relationship of our results 
to other rate results due to Kiefer (1982), Stone (1980), Fan (1991), and 
Zhang (1990) will be discussed later in the section. 

As before, let T>k denote the class of k— monotone densities on [0, oo). Here 
is the notation we will need. Consider estimation of the j—th derivative of 
g € T>k at xq for j G {0, 1, . . . , k — 1}. If T n is an arbitrary estimator of 
the real- valued functional T of g, then the (L\— )minimax risk based on a 
sample X\ , . . . , X n of size n from g which is known to be in a suitable subset 
2\ n of Pfc is defined by 

MMR\ (n, T, Pjk >n ) = inf sup E g \f n - Tg\ . 

Here the infimum ranges over all possible measurable functions t n : R n — ► 
R, and T n = t n (Xi, . . . , X n ). When the subclasses V^^ n are taken to be 
shrinking to one fixed go <E T>k, the minimax risk is called local at go- The 
shrinking classes (parametrized by r > 0) used here are Hellinger balls 
centered at go: 

T>k,n = 2\n,r = 1 5 G T> k : H 2 (g, g ) = i J (y/ g(x) - go{x)) 2 dx < r/n 

The behavior, for n —> oo of such a local minimax risk MMR\ will depend 
on n (rate of convergence to zero) and the density go toward which the 
subclasses shrink. The following lemma is the basic tool for proving such a 
lower bound. 

Lemma 4.1 Assume that there exists some subset {g e : e > 0} of densities 
in T>k, n such that, as e [ ; 

H 2 (g e ,g ) <e(l + o(l)) and \Tg e - Tg \ > (ce) r (l + o(l)) 

for some c > and r > 0. Then 

L / cr \ r 

sup lim inf n r MMR l (n, T, V k>n ) > - ( — ) . 
r>o n ^°° 4 Vze/ 
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Proof. See Jongbloed (1995) and Jongbloed (2000). ■ 



Here is the main result of this section: 

Proposition 4.1 Let go £ V k and xq be a fixed point in (0, oo) such that 
go is k times differentiable at xq (k > 2). An asymptotic lower bound for 
the local minimax risk of any estimator T n j for estimating the functional 
Tjgo = go\x ), is given by: 

suplimmfn^MMi^n^P^) > ^ {x Q )\^ +1 g {x ) k ^ d ktj , 
t>o n ^°° L J 

where dkj > 0, j G {0, . . . , k — 1}. Here 

1 A-j _A ^ x k,[ 



where 



^_^„ (»H-3)» +a ) (m+wi whenkiseven 

{k + 1> (4k+T)m-m 2 (L/Z-i)) 

and 

\ k 2 = 2 Ai * k+2 \2k + 3)(k + 2) (( 2 ( fc + _ when k is odd 

and, with r(x) = (1 - x 2 ) fe+1 (l + x) for -1 < x < 1 and C k j = r^(0), 

Ck,j 



A fc,i 



Ck,k 



0<j<k-l. 



Proposition 4.1 also yields lower bounds for estimation of the correspond- 
ing mixing distribution function F at a fixed point. 
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Corollary 4.1 Let go £ V k and let xq be a fixed point in (0,oo) such that 
go is k— times differentiable at xq, k > 2. Then, for estimating Tgo = F(xo) 
where Fq is given in terms of go by (3), 



The lower bound results in Proposition 4.1 are consistent with the results 
of Kiefer (1982) and Stone (1980) (although our result involves a slightly- 
stronger lower bound since the supremum is over just a local neighborhood 
of the truth). In particular, Kiefer showed that rates of convergence in esti- 
mation cannot be improved by order restrictions, but that order restrictions 
might result in improvements of the constants. This latter suggestion has 
been investigated in detail in the case of monotone densities by Birge (1987), 
Birge (1989). The dependence of our lower bound on the constants 50(^0) 
and g Q k \xo) matches with the known results for k = 1 and k = 2 due to 
Groeneboom (1985) and Groeneboom, Jongbloed, and Wellner (2001b), 
and will reappear in the limit distribution theory for k > 3 in Balabdaoui 
and Wellner (2004c). 

The result of Corollary 4.1 is consistent with the lower bound results 
of Zhang (1990) and Fan (1991) in the deconvolution setting as we now 
explain. 

To link up with the deconvolution literature we transform our scale mix- 
ture problem to a location mixture or deconvolution problem. To do this 
we will reparametrize our k— monotone densities so that the beta kernels 
converge to the limiting exponential kernels: Note that if 



then for X ~ g, Z = Z k ~ k x Beta(l,fc), and Y ~ F with Y and Z 
independent, we have 
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X = ZY . 



if —MONOTONE: CHARACTERIZATIONS, CONSISTENCY, LOWER BOUNDS39 
Thus 

X* = logX = lo g y + logZ = Y* + z* . 
Hence the density g* of X* is given by 

/oo / i \ k—1 poo 

1 - -e?-y e x ~ydF*(y) = / f z .(x - y)dF*(y) 
-oo V ™ / + J —oo 

where F*(y) = F(e y ) is the distribution function of Y*. 

For the completely monotone case corresponding to k = oo, the corre- 
sponding formulas for g and g* are given by 

/•oo 1 

g(x)= -exp(-x/y)dF(y), 

Jo u 

and 

/oo /*oo 
exp(-e-^) e -^F*(y) = / f z *Jx - y)dF*(y) . 
-oo J —oo 



According to Fan (1991), we need to compute the characteristic function 
4>z* and bound its modulus above and below for large arguments. Thus we 
calculate first for Z^: from Abramowitz and Stegun (1964), page 930, 

/oo roo 
e itz e- £Z e z dz= / e in ^ v e~ v dv = T(l + it) . 
-oo JO 

Thus by Abramowitz and Stegun (1964), page 256, 

7r/ lirt 

\$ Z * ( t )\ 2 = T(l + it)T(l - it) = - . . = — —r , 

Ir °° WI v 1 y 1 sinh(Tri) e^-e-^*' 
and it follows that 

V27r|t|exp(-7r|t|/2) < \(f)Y^(t)\ < y/&ir\t\ exp(-7r|*|/2) 

for \t\ > 1. Thus the hypothesis (1.3) of Fan (1991) holds with (3 = 1, 
(3\ = 1/2 and (3q = 1/2. This implies the first hypothesis of Fan's theorem 
4, page 1263, and thus we are in the case of a "super-smooth" convolution 
kernel. Fan's second hypothesis is easily satisfied by the current extreme 
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value distribution function since fz^{y) = 0(\y\~ 2 ) as y — ► ±00. It there- 
fore follows in the completely monotone case (k = 00) that for estimation 
of -F *(yo) = F(e yo ) the resulting minimax lower bound yields the rate of 
convergence (logn) -1 . This rate could also be deduced from Zhang (1990), 
Corollary 3, page 824. (Note that the tail behavior of the characteristic 
function of our extreme value kernel coincides with the tail behavior of the 
characteristic function of the Cauchy kernel and that Zhang's example 2 
yields the rate (logn) -1 in the case of the Cauchy kernel.) 

We can also follow the deconvolution approach to obtain a minimax lower 
bound for estimation of the mixing distribution in the k— monotone case: the 
characteristic function of Z£ = log is given by 

/oo / 1 \ k—1 

e itz \l--e z ) e z dz = J e itlosv (l - v/kf^dv 

k u T(k + l)r(l + it) 

r(fc + 1 + it) ' 
k u r{k + i)r(i + it) k- u r(k + i)r(i - it) 

T(k + l + it) r(fc + 1 - it) 

r(fc + i) 2 

(k + it)(k-l + it)---(l + it)(k - it){k - 1 - it) ■ ■ ■ (1 - it) 

(fc!) 2 (fc!) 2 
(k 2 +t 2 )---{l + t 2 ) ~ t 2k 

It should also be noted that 



Thus 



l<M*)l z 



as t — ► 00 . 



(fc 1 ) 2 nt 

lim \(j)z*(t)\ 2 = lim — = V N ' ; ^ = —-. — - = \6 Y * (t)\ 2 . 

fc^oo 1 ^ fcWI k^oo (k 2 + t 2 ) • • • (1 + t 2 ) sinh(Trf) 

Thus 

fc' 

and we are in the situation of a smooth convolution kernel of hypothesis 
(1.4) of Fan (1991), page 1263, with Fan's (3 = k in our setting. Thus Fan's 
theorem (extended to negative values of I) gives our rate of convergence for 
estimating F*(y ) = F(e Vo ) or g^ 1 ^ by taking I = — 1, a + m = 0, and 
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[5 = k. By "extending" Fan's theorem further and taking I = — (k — j), we 
get the rate of convergence n~^ k ~^^ 2k+1 \ j = 1, . . . , k — 1 for estimation of 

Proof of Proposition 4.1. Let \i be a positive number and consider the 
function g M defined by: 

9n,{x) = go(x) + s(n)(x + n - x) k+1 (x - x + n) k+2 l [xo _^ Xo+lA {x), x £ (0, 

where s(fi) is a scale to be determined later. We denote the unsealed per- 
turbation function by g^; i.e., 

h( x ) = ( x o + ^ ~ x) k+1 (x - x + n) k+2 l [xo -^ xo+ll] (x). 

If /j, is chosen small enough so that the true density go is fe-times differentiable 
on [xq — jU, xo + /x] and g^ is continuous on the latter interval, the perturbed 
function g^ is also A;-times differentiable on \xq — fi, xq + fj] with a continuous 
k-th derivative. Now, let r be the function defined on (0, oo) by 

r(x) = (1 - x) k +\l + x) fc+2 l hljl] (x) = (1 - x 2 ) k+ \l + x)l hljl] (x). 

Then, we can write g^ as 

U x ) = ^ fc+3 - (^) • 

Then for < j < k 

9?( x o) - 9 { o J \xo) = s( M )M 2fc+3 - J r^(0). 

The scale s(/x) should be chosen so that for all < j < 

(-l^fl^Oc) > 0, for x e[x - fj,,x Q + 

But for small enough, the sign of (— l)i gjP will be that of (— 1) j '</q^(xo), 
and hence is fc— monotone. For j = k, 

9 P(x ) = g { k \x ) + s^)^ k+3 A k \0). 
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Assume that r^(0) / 0. Set 

a (u) - 9 » ){xo) X — 
~ r(*)(0) Ai fc+3 ' 

Then for < j < A; - 1 



as fi — ► 0, and so we can choose /x small enough so that (— l)i gffi {xq) > 0. 
For j = k 

(-l)^(* )=2(-l)^f (x )>0. 
To show that r^)(0) ^ for < j < k, we define 

%n,m = ((1 — ) ) 



a;=0 



Let m > 2 and 2n > m. We have 

((l-^n)^ = (((l-xYy)^ 

= (-2nx(l-x 2 )"- 1 ) (m " 1) 



-2n 



(s ((1 - s 2 )"- 1 )^ -1 * + (m - 1) ((1 - x 2 )™" 1 )^) 



where in the last equality, we used Leibniz's formula for the derivatives of 
a product; see e.g. Apostol (1957), page 99. Evaluating the last expression 
at x = yields 

x n ,m = -2n(m - l)x n _i jm _ 2 . 
If m is even, we obtain 



m/2-1 m/2-1 

( _ 2)m /2 "Q (n _ i)x "Q ( m -2*-l)xx n _ m/2;0 

i=0 i=0 
m/2-1 m/2-1 

= (-2)" 1 / 2 JJ (n - i) x [j (m - 2i - 1) 

i=0 i=0 
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since £ n _ m /2,o = 1- Similarly, when m is odd, we have 

(m-l)/2-l (m-l)/2-l 

x n ,m = (-2)( m ^)/ 2 ]J (n-i)- (m - 2i - 1) ■ ar n _ (m _ 1)/2)1 

i=0 i=0 

= 0, 

since x n -r m -i)/2,i = 0- Now, we have for 1 < j < k 

r®(x) = ((l-x 2 ) fc+1 (l + ^)) 0) 

= (x + 1) ((1 - x 2 ) k+1 ) ij) + j ((1 - x 2 ) k+1 ) (J ~ 1] 

and hence 

r 0)(0) = ((l-x 2 ) fc+1 ) (j) +j((l-x 2 ) fc + 1 )°" 1) . 

Therefore, when j is even, the second term vanishes and 

3/2-1 j/2-l 
r^(0) = (-2y/ 2 [J (A; + 1 - i) x JJ (j - 2i - 1) ^ 0. 

j=0 i=0 
When j is odd, the first term vanishes and 

0-i)/2-i (j-l)/2-l 
r^(0) = (-2)^'- 1 )/ 2 [J (Hl-i)xjx ]j (j-2i-2) 

i=0 i=0 
0-l)/2-l (j-l)/2 

= (-2)^-- 1 )/ 2 n (Hi-i)x n (j-2i)^o. 

i=0 i=0 



We set 

C fc j = rW(0), for 1 < i < . 
Then Cfc^ specializes to 



' ( _ 2)fc /2 nf/J- 1 ^ + 1 - i) x n?5 _1 (^ - 2i - 1), if k is even 
(_ 2 )(^i)/2]T^- 1 )/ 2 - 1 (A ; +l-i) x n£o 1)/2 (fc " 2i), iffeisodd. 
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The previous expressions can be given in a more compact form. After some 
algebra, we find that 



(1) c k> , 



2 x {-l) k l 2 {k + l)(k - l)!( fc/ 2_i), if k is even 
(-l) {k - 1)/2 kl( (k % 2 ), if k is odd. 



We have for < j < k — 1, 



5o l^oj 



/X 



where we defined \^\ = ICfcj/Cfc^l for j G {0, . . . , k — 1}. Furthermore 



{9ii(x) -go{x))' 
9o(x) 



■dx 



(<70° M) 2 ( XQ + ^ _ x )2(fc+l) ( x _ XQ + M )2(fe+2) 



Jxo—n 



( X 0)) (^ _ j,2)2(fc+l) ( y + /i )2 



—2 r 

'c,fc) V— /l 



M 2 <* +3 >(C fc) *) 2 J-? 9o(xo + y) 

(^ )( " 0) ) 2 v „4( fe+ l) + 3 j' (l-* 2 ) 2 (* +1 >(* + l)' 



M 2(fc+3) (^,fc) 2 



X /J, 



9o{xq + nz) 



dz 



(56 fc) (^0)) /"I (1- Z 2)2(fc+1)^ + 1 )2 



2fc+l 



/X 



V 



5o(^o) 



2fc+2\ 



as n \ 0. This gives control of the Hellinger distance as well in view of 
Jongbloed (2000), Lemma 2, page 282, or Jongbloed (1995), Corollary 
3.2, pages 30 and 31. We set 



Afc,2 = 



^ 1 (l-^)2(fc+l) (z + 1) 2^ 



(C fe , fc ) 2 

imsart-aos ver. 2005/05/19 file: kmon-pl-bwf4.tex date: February 2, 2008 



i^-MONOTONE: CHARACTERIZATIONS, CONSISTENCY, LOWER BOUNDS45 
The constants Xk,2 can be given more explicitly using the formula 

n ' 2V ~Jo ( } (2n + 2)! ( 2 ^Hi)' 

for any integers n and p, using the convention 
/ n + p\ /2(n + p) + l 



n+1/ V 2(n+l) ' 



when p = 0. We have, 



f (1 - x 2 ) 2 ^ (x + l) 2 ^ = fiX- x 2 ) 2 ^x 2 dx + y 1 (1 - x 2 ) 2 ( fe+1 W, 



since 



y 1 (i - x 2 f^xdx = o, 



and hence 



J\l ~ * 2 ? (k+l) {x + l) 2 dx = 2(/ 2(fc+1)j2 + / 2 ( fc+ i) i0 ) 

4fc+6 (2(fc + l))!(2fc + 3)!(^) 2 4fc +5((2(fc + l))!) 2 
(4^ + 6)! (4fc + 5)! 

_ 2 4 fc+5 ((2(fc + l))0 2 / 2(2fc + 3) 

(4fc + 6)! V 4A: + 7 + ^ + ^ 

= 24fc+5 ((2 ( ( 4 fc + 7)! ^ m + 6) + (4fc + 6)(4fc + 7)) 
= 2 4 fe+5 ((2( 4 fc + l 7 ))!) 2 (4fc + 6)(4fc + 8) 

(2) = 2^ 2 )(2fc + 3)(fc + 2) ((2 ( ( 4 fc fc + 1 7 ) ) f . 

Combining (1) and (2), we find that is given by 

n 4(k+i)(2k + 3)(k + 2) ((2(fc + l))!) 2 . 

^ k+1 > (4fc + 7)!((fc -!)!)=»(( 
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and 



A fc 2 = 2 4 ( fc+2 ) (2k + 3) (A; + 2) (( 2 ( k + 1 )) 1 ) 2 ^ . when k is odd - 

(4k + 7)W (( {k % 2 )) 

Now, by using the change of variable e = ji 2k+1 (bk + o(l)), where 



5o(^o) 

so that = (e/6 fc ) 1/(2fc+1) (1 + o(l)), then for < j < k - 1, the modulus of 
continuity, rrij , of the functional Tj satisfies 

/ \ (k-j)/(2k+l) 



The result is that 



where 



mj(e) > (r fcJ e) **+i(l + o(l)), 



(«*<>)) 
= r- 



(2fc+l)/(fc-j) 



and hence 



(3) 



sup lim inf n™+i MM Ri(n,Tj,V knT ) 
T>0 n— oo ' ' 



> 



4^-e" 1 
4V 2fc+l 



2k + l 



k—j 



('/V>' 



which can be rewritten as 



fc-j 



sup lim mlri^MMR^Tj.Vk^r) 

T>0 



> - 4 



4V 2fc+l 
for j = 0, • • • , k — 1. 



(A fc)2 ) 



2j + l , . 

9o (xo) 9o(x ) 2k+1 
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5. Preliminary numerical results. From the standard Exponential 
distribution Exp(l) we simulated two samples of respective sizes n = 100 
and n = 1000. For any fixed k > 1, the Exponential density is k- monotone. 
Based on each sample, we computed the LSE and MLE for k = 3 and 
k = 6 in both the direct and inverse problems using the iterative (2k — l)-th 
spline algorithm described in Balabdaoui and Wellner (2004b). It should 
be noted that the true mixing distribution that corresponds to a standard 
Exponential when viewed as a A;- monotone density is Gamma(k + 1,1). 
Indeed, 

I W) {t ~ x)t ' le dt=l 

for all x > 0, and hence 

f°°(t- xf' 1 , f°°(t- xfr 1 , , 
exp(-x) = / ±— '——e' t dt= / ■— '^—e^dt 



, (k-iy. Jo (k-iy. 

k—l i poo 1+ ™\k— 1 



where is the Gamma(k + 1, 1) density. 

For k = 3, the plots in Figures 1 and 2 show the ML and LS estimators 
of the Exponential density (direct problem) and the Gamma distribution 
(inverse problem) based on n = 100 and 1000 respectively. For k = 6, 
similar plots were produced and are shown in Figures 3 and 4. 

Table 1 

Table of the obtained LS estimates for k — 3,6 and n — 100, 1000 and the corresponding 
numbers of iterations N it . A support point is denoted by a and its mass by to. 





k, n 




N a 


(5, w) 


k = 


- 3, n = 


■- 100 


13 


(0.569, 0.0459), (1.829, 0.168), (1.909, 0.0347), 










(2.839, 0.497), (7.939, 0.027), (7.989, 0.227) 


k = 


3, n — 


1000 


14 


(0.814, 0.042), (1.674, 0.027), (2.124, 0.300), (3.254, 0.100), 










(4.924, 0.450), (5.334, 0.001), (8.874, 0.037), (9.934, 0.039) 


k = 


- 6,n = 


: 100 


4 


(2.109, 0.067), (4.999, 0.750), (17.449, 0.190) 


k = 


6, n = 


1000 


6 


(2.625, 0.017), (3.615, 0.478), (6.575, 0.478), (11.375, 0.262) 
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Table 2 

Table of the obtained ML estimates for k — 3,6 and n — 100, 1000. A support point is 
denoted by a and its mass by w. 

k,n (a,w) 
k = 3,n = 100 (0.549, 0.040), (1.259, 0.051), (1.819, 0.072), 
(2.579, 0.027), (2.589, 0.492), (6.839, 0.314) 
k = 3, n = 1000 (0.684, 0.025), (1.664, 0.120), (2.114, 0.184), 

(3.164,0.141) 
(4.794, 0.236), (4.824, 0.184), (8.304, 0.107) 
k = 6, n = 100 (3.839, 0.428), (3.849, 0.165), (10.479, 0.405) 
k = 6, n = 1000 (3.042, 0.186), (6.452, 0.300), (6.482, 0.267), 
(11.072, 0.018), (11.102, 0.226) 



The figures illustrate consistency in both the direct and inverse problems, 
and it can be seen that convergence in the direct problem is faster than it 
is in the inverse problem. This is already predicted by the corresponding 
theoretical rates of convergence, n - k /( 2k + 1 ) an d n _1 /( 2fc+1 ) respectively. 

Note that the number of jump points of the estimators of the mixing 
Gamma distribution, which are also the knots of the estimators of the Ex- 
ponential density, are fewer for k = 6 than for k = 3: e.g. for n = 1000, 
there are 8 jump points for k = 3 versus 4 only when k = 6 (for both esti- 
mators). This was also observed in other simulations, and we obtained even 
fewer points for larger values of k. This is not surprising and is rather a 
consequence of the fact that gap between the knots (of order n _1 /( 2fc+1 )) is 
expected to get bigger with k. When k increases, the number of constraints 
on the estimated mixed density grows, and hence it becomes harder to "un- 
tangle" the mixing distribution F from the very smooth Beta kernel. Finally, 
it should be mentioned that although the MLE and LSE show very small 
visible differences in the direct problem, it can be easily checked by com- 
paring the locations of jump points or the heights of the jumps that these 
estimators are different (compare Table 1 and Table 2). 
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(1a) - LSE, k=3, n=100 (direct problem) (1b) - MLE, k=3, n=100 (direct problem) 




012345 012345 



(2a) - LSE, k=3, n-100 (inverse problem) (2b) - MLE, n-100 (inverse problem) 




5 10 15 5 10 15 



Fig 1. Illustration of k-montone estimation for k — 3 via the ML and LS methods based 
on a sample size n = 100. Plots (la) and (lb) show the LS and ML estimators (dashed 
lines) of the exponential density (solid line). Plots (2a) and (2b) show the LS and ML 
estimators (dashed line) of Gamma(4,l) (solid line), the true mixing distribution. 
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(1a) - LSE, k=3, n=1000 (direct problem) (1b) - MLE, k=3, n=1000 (direct problem) 




012345 012345 



(2a) - LSE, k-3, n-1000 (inverse problem) (2b) - MLE, k=3, n-1000 (inverse problem) 




5 10 15 5 10 15 



Fig 2. Illustration of k-montone estimation for k — 3 via the ML and LS methods based 
on a sample size n = 1000. Plots (la) and (lb) show the LS and ML estimators (dashed 
lines) of the exponential density (solid line). Plots (2a) and (2b) show the LS and ML 
estimators (dashed line) o/Gamma(4,l) (solid line), the true mixing distribution. 
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(1a) - LSE, k=6, n=100 (direct problem) (1b| - MLE, k=6, n=100 (direct problem| 




012345 012345 



(2a) - LSE, k=6, n-100 (inverse problem) (2b) - MLE, k-6, n-100 (inverse problem) 




5 10 15 20 5 10 15 20 



Fig 3. Illustration of k-montone estimation for k — 6 via the ML and LS methods based 
on a sample size n = 100. Plots (la) and (lb) show the LS and ML estimators (dashed 
lines) of the exponential density (solid line). Plots (2a) and (2b) show the LS and ML 
estimators (dashed line) of Gamma(7,l) (solid line), the true mixing distribution. 
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(1a) - LSE, k=6, n=1000 (direct problem) (1b) - MLE, k=6, n=1000 (direct problem) 




012345 012345 



(2a) - LSE, k-6, n-1000 (inverse problem) (2b) - MLE, k=6, n-1000 (inverse problem) 




5 10 15 20 5 10 15 20 



Fig 4. Illustration of k-montone estimation for k — 6 via the ML and LS methods based 
on a sample size n — 1000. Plots (la) and (lb) show the LS and ML estimators (dashed 
lines) of the exponential density (solid line). Plots (2a) and (2b) show the LS and ML 
estimators (dashed line) of Gamma(7,l) (solid line), the true mixing distribution. 

6. Conclusion. In this first part, we have established existence of the 
MLE g n and LSE g n of a k- monotone density go, and provided characteriza- 
tions. We have proved that both estimators are consistent in several senses 
as a first step toward understanding their asymptotic behavior. Consistency 
of higher derivatives of the estimators is usually not guaranteed in non- 
parametric density estimation problems, but here it is obtained "for free" 
because of the particular shape constraints and smoothness of the density. 
In the sense of pointwise mean absolute error, local asymptotic minimax 
lower bounds show that the rate of convergence of the j-th derivative of the 
MLE and LSE for j = 0, ■ ■ ■ , k - 1 cannot be faster than n -(k-f)/(^k+i) _ 

Parts 3 and 4 are devoted to show that this rate, modulo a conjecture 
about boundedness of the error in a particular Hermite interpolation prob- 
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lem, is attained by the j-th derivative of the estimators, and that the joint 
asymptotic distribution of these derivatives involve a (2/c)-convex stochastic 
process staying above (below) the (k — l)-fold integral of two-sided Brown- 
ian motion plus a deterministic drift if k is even (odd). In the joint limiting 
distribution, the asymptotic variances are found to have the same depen- 

(k) 

dence on go(xo) and |#q (xo)| as the asymptotic constants obtained in the 
minimax lower bounds. 
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